MapReduce Online

نویسندگان

  • Tyson Condie
  • Neil Conway
  • Peter Alvaro
  • Joseph M. Hellerstein
  • Khaled Elmeleegy
  • Russell Sears
چکیده

MapReduce is a popular framework for data-intensive distributed computing of batch jobs. To simplify fault tolerance, many implementations of MapReduce materialize the entire output of each map and reduce task before it can be consumed. In this paper, we propose a modified MapReduce architecture that allows data to be pipelined between operators. This extends the MapReduce programming model beyond batch processing, and can reduce completion times and improve system utilization for batch jobs as well. We present a modified version of the Hadoop MapReduce framework that supports online aggregation, which allows users to see “early returns” from a job as it is being computed. Our Hadoop Online Prototype (HOP) also supports continuous queries, which enable MapReduce programs to be written for applications such as event monitoring and stream processing. HOP retains the fault tolerance properties of Hadoop and can run unmodified user-defined MapReduce programs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Online Integrated Development Environment for MapReduce Programming

Though MapReduce programming model simplifies the development of parallel program, ordinary users have difficulties in setting up the development environment for MapReduce. The online integrated development environment for MapReduce programming can solve this problem, thus users need not build the environment themselves, only need to focus on the logical design of the parallel program. During t...

متن کامل

Online Aggregation for Large MapReduce Jobs

In online aggregation, a database system processes a user’s aggregation query in an online fashion. At all times during processing, the system gives the user an estimate of the final query result, with the confidence bounds that become tighter over time. In this paper, we consider how online aggregation can be built into a MapReduce system for large-scale data processing. Given the MapReduce pa...

متن کامل

MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads

MapReduce has become a widely used computing model for largescale data processing in clusters and data centers. A MapReduce workload generally contains multiple jobs. Due to the general execution constraints that map tasks are executed before reduce tasks, different job execution orders in a MapReduce workload can have significantly different performance and system utilization. This paper propo...

متن کامل

MapReduce with Deltas

The MapReduce programming model is extended conservatively to deal with deltas for input data such that recurrent MapReduce computations can be more efficient for the case of input data that changes only slightly over time. That is, the extended model enables more frequent re-execution of MapReduce computations and thereby more up-to-date results in practical applications. Deltas can also be pu...

متن کامل

Beyond Online Aggregation: Parallel and Incremental Data Mining with Online Map-Reduce (DRAFT)

There are only few data mining algorithms that work in a massively parallel and yet online (i.e. incremental) fashion. A combination of both features is essential for mining of large data streams and adds scalability to the concept of Online Aggregation introduced by J. M. Hellerstein et al. in 1997. We show how an online version of the MapReduce programming model can be used to implement such ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010